Skip to content

Feat/peer metadata display#21

Merged
robmsmt merged 9 commits into
mainfrom
feat/peer-metadata-display
May 18, 2026
Merged

Feat/peer metadata display#21
robmsmt merged 9 commits into
mainfrom
feat/peer-metadata-display

Conversation

@robmsmt
Copy link
Copy Markdown
Contributor

@robmsmt robmsmt commented May 18, 2026

No description provided.

robmsmt added 9 commits May 17, 2026 16:12
Backend (model_service): pass through hostname, version, status, labels,
and convenience pulls (launched_by, slurm_job_id, worker_group_id,
framework, started_at) for each DNT peer. Also surface metrics-only
follower peers (no service, but with worker_group_id) so multi-node
replicas can be reconstructed in aggregation.

Frontend (ModelList): group raw peers by worker_group_id to count
replicas distinctly from peers/nodes. Headline now reads
"Available Models X, Replicas Y" when the two diverge.

Frontend (ModelCard): clicking the card now expands inline instead of
opening OpenWebUI. The expansion shows:
  - Open in OpenWebUI button (the prior click behaviour)
  - Per-replica monospace block with model, launched_by, slurm_job_id,
    started_at, framework, version, head + follower hostnames
  - Topology header (e.g. "2 nodes × 4x GH200")
  - Per-replica extra-labels block for anything else OCF carries

Fixtures: snapshot of live prod /dnt/table + a script that synthesises
the post-v0.0.6 shape (hostname/version/status/labels) by adding a
multi-node replica demo (shared worker_group_id, one head + one
metrics-only follower) so the new code paths have a realistic test.
Settings: ocf_head_addr → otela_head_addr, ocf_fixture_path → otela_fixture_path.
DEPLOY NOTE: this changes the env var names from OCF_HEAD_ADDR /
OCF_FIXTURE_PATH to OTELA_HEAD_ADDR / OTELA_FIXTURE_PATH. Ops side must
update before/with this deploy or /v1/models* will hit an empty endpoint.

Comments, README, k8s manifests, and the in-repo guides now refer to
"OpenTela" rather than "OCF". External image tag
(ghcr.io/researchcomputer/ocf:*) and the in-binary mount path
(/root/.ocfcore/keys) are untouched — both are dictated by upstream and
would need a coordinated rename there first.

Fixture mode: when OTELA_FIXTURE_PATH is set, /v1/models* reads that
JSON file instead of HTTP-getting OTELA_HEAD_ADDR/v1/dnt/table. Used
for iterating on the new model-card expansion UI against the
synthesised post-v0.0.6 payload before the binary actually ships.
Use pydantic AliasChoices so OCF_HEAD_ADDR / OCF_FIXTURE_PATH still
populate the renamed settings. Deployments can migrate on their own
schedule without a synchronized cut-over.

When both legacy and canonical names are set, the canonical OTELA_*
wins — a partial migration shouldn't silently keep the legacy value in
force.
Upgraded DNT fixture now includes framework_args (the second monospace
block in the card expansion), expires_at (SLURM time limit applied to
started_at), slurm_reservation (mixed in for some launches), and
varied started_at values spread across several hours so the UI shows
a realistic mix of ages.

Note: framework_args isn't in the opentela --label set we shipped yet
— this fixture preempts a planned follow-up patch there. Until that
ships, real prod data won't carry framework_args; serving-api just
hides the row.

ModelCard expansion: filter out empty rows so the legacy / pre-v0.0.6
case shows just what's known (peer ids + model) instead of a wall of
"?" placeholders. When no labels exist at all, render a small amber
hint pointing at the v0.0.6 requirement instead of silently rendering
an empty card.
Same as `make run` but forces OTELA_FIXTURE_PATH at the synthesised
upgraded fixture, so the model card UI shows the v0.0.6-shape payload
(hostname, version, labels, multi-node demo) without depending on live
prod state or whatever's in the developer's .env.

Use `make run` to hit live prod, `make dummy-run` to iterate on the UI.
@robmsmt robmsmt merged commit 0582bc0 into main May 18, 2026
2 checks passed
@robmsmt robmsmt deleted the feat/peer-metadata-display branch May 18, 2026 11:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant